A modified version of diagonal systematic sampling in the presence of linear trend

Systematic sampling is one of the simplest and popular methods for selecting a random sample from a finite population. The diagonal systematic sampling scheme is a type of systematic sampling design which has gained the attention of researchers during the last two decades. In this paper, a modification to the conventional diagonal systematic sampling design is proposed for use in situations where population units follow a linear trend. It is found that the proposed strategy reduces the variance of the diagonal systematic sampling thus resulting in an efficient sampling design. The mathematical conditions under which the suggested modified diagonal systematic sampling design is more precise than some of the available sampling designs are derived. With the help of a numerical illustration using milk yield data, it is shown that the proposed sampling scheme is more efficient than some of the available sampling schemes.


Introduction
In survey sampling, the linear systematic sampling scheme originally developed by Madow and Madow [1] is used to obtain a sample of size n units from a finite population of size N units in such a way that the first unit is obtained from the first k (= N/n) units and then every kth unit is systematically selected in the sample. A limitation of the usual linear systematic sampling procedure is that it requires the population size to be a constant multiple of the required sample size. To cope with this issue, Lahiri [2] introduced circular systematic sampling scheme. Mukerjee and Sengupta [3] proposed some optimal sampling strategies for estimation of mean. Chang and Huang [4] developed a modified version of the systematic sampling popularly known as the remainder systematic sampling for use in situations where the population size is not a constant multiple of the required sample size. The concept of diagonal systematic sampling scheme as an alternative approach to the classical systematic sampling was introduced by Subramani [5]. Sampath and Varalakshmi [6] introduced a new modified systematic sampling method known as diagonal circular systematic sampling scheme. Subramani [7] developed a generalized form of the original diagonal systematic sampling scheme. Another modified version of the linear systematic sampling for use in situations where the sample size is odd, was suggested by Subramani [8] which was found to be more efficient than linear systematic sampling scheme. Khan  systematic sampling under equal probability sampling which was a generalization of both linear and circular systematic sampling schemes. A generalization of the usual systematic sampling method was suggested by Subramani and Gupta [10] which improved the linear systematic sampling in terms of efficiency. The method suggested by Subramani and Gupta [10] was practically more useful than linear systematic sampling as it didn't require the population size to be a constant multiple of the required sample size. However, the limitation of the Subramani and Gupta [10] method was that the sample mean based on this sampling scheme was a biased estimator of the mean of finite population. Subramani and Singh [11] developed an optimal form of circular systematic sampling for populations following linear trend. Naidoo et al. [12] developed a new modified version of balanced systematic sampling. A comparative performance of circular systematic and simple random sampling under linear trend scenario was studied by Subramani [13]. Gupta et al. [14] introduced a new modification of systematic sampling which is based on multiple random starts. Recently, Azeem et al. [15] proposed a new systematic sampling design for estimation of population mean. Further studies related to systematic sampling can be found in Madow [16], Yates [17], Bellhouse and Rao [18], Bellhouse [19], Fountain and Pathak [20], Sampath and Uthayakumaran [21], Subramani [22], Khan et al. [23], and Naidoo et al. [24] etc. In this paper, an efficient modified diagonal systematic sampling method is proposed for situations where the population units follow a linear trend. The mathematical conditions under which the suggested sampling design is more efficient than some of the existing sampling schemes have been derived. It has been shown that the new systematic sampling scheme is more precise than some of the existing sampling schemes.

Proposed modified diagonal systematic sampling scheme
Let the population consists of N units with labels 1, 2, 3, . . ., N and it is required to draw a sample of size n such that N = nk = (n − 1)k + k. Motivated by Subramani [8], a modified version of diagonal systematic sampling is proposed. While the conventional diagonal systematic sampling selects a sample of size n from the whole population regarded as a single group, the proposed method separates the last k units from the rest of the population. Thus the population is divided into two groups where the first n−1 units of the required sample are selected from the first group and the last unit is selected independently from the second group. Such a partition of the population into two disjoint groups results in a more efficient sampling scheme as shown in section 5 and 6. The steps involved in the proposed method are as follows: 1. Partition the population into two sets: Set-1 and Set-2, in such a manner that Set-1 receives the first (n − 1)k units and Set-2 receives the remaining k units.
2. In Set-1, arrange the units in a (n − 1) × k square matrix. In Set-2, arrange the k units in a row having units y (n−1)k+1 , y (n−1)k+2 , . . ., y nk as shown in Table 1. 3. Obtain two random numbers r 1 and r 2 where 1 � r 1 � k and 1 � r 2 � k. In Set-1, the units are drawn in such a way that the selected n − 1 units are the entries in the diagonal or broken diagonal of the matrix. In Set-2, a unit is selected at random and is combined with the n − 1 units selected from Set-1 to complete the sample of size n.
It is clear that the suggested modified diagonal systematic sampling design has k × k = k 2 possible samples each of size n. For the proposed method, the first and second order probabilities of inclusion are: and and jth units are from the same diagonal or broken diagonal of Set À 1; 1 k 2 if ith and jth units are from Set À 1 and Set À 2 respectively; 0 otherwise: Generally, the selected sampling units are: where r 1 = 1, 2, . . ., k; r 2 = 1, 2, . . ., k.
The sample mean based on the proposed systematic sampling scheme is given by: where Theorem: Under the proposed sampling scheme, the sample mean can be written in the form of Horvitz-Thompson estimator � y HT suggested by Horvitz and Thompson [25] and is unbiased with variance: Proof: By definition, where s 1 and s 2 denote the samples drawn from Set-1 and Set-2 respectively.
where 's' denotes the total number of units selected in the sample. Taking expectation on both sides of (6) yields: Now, Similarly, where S 1 and S 2 denotes all units in Set-1 and Set-2 respectively. Substituting (9) and (10) in (8) and simplification yields: Taking variance on both sides of (3) yields: where and, Substituting (12) and (13) in (11), the variance of � y mdsy is obtained as: Remark 1: Using Sen-Yates-Grundy approach suggested by Sen [26] and Yates and Grundy [27], the variance of � y ndsy can be written as: Remark 2: The Sen-Yates-Grundy estimator for (15) is given by: The values of π i and π ij can be used from (1) and (2) in expression (15) and (16) to obtain the sampling variance of the mean and its estimator under the modified diagonal systematic sampling scheme. Moreover, it is to be noted that since the second order probabilities are zero for some pairs of units, so it is not possible to unbiasedly estimate Varð� y ndsy Þ. This is a common drawback of linear and systematic sampling, Subramani's [8] method and the proposed method.

Linear trend
Let the N = nk = (n − 1)k + k, k � n − 1 units of the finite population follow a linear trend. That is, The variance of the mean based on simple random sampling scheme in the case of linear trend is given by: The sampling variance of the mean in linear systematic sampling is given by: The variance of the mean in diagonal systematic sampling is: The variance of the mean based on Subramani's [8] modified systematic sampling scheme is given by: The variance of the sample mean in Subramani's [13] optimal circular systematic sampling is given by: Finally, the variance of the mean in the proposed method is obtained as: Since there are (n − 1)k units in Set-1, this implies putting n = n − 1 in (20) leads to: Also, there are k units in Set-2 and since the right hand side of (19) is independent of n, so Using (24) and (25) in (23), the variance of � y ndsy in the case of linear trend is obtained as:

Efficiency comparison of the proposed systematic sampling with existing methods
If the units of population follow a linear trend, the proposed modified diagonal systematic sampling design is more precise than simple random sampling scheme if: Varð� y ndsy Þ < Varð� y r Þ: ð27Þ Using (18) and (26) in (27) and on simplification, the condition reduces to: The proposed sampling scheme is more efficient than linear systematic sampling scheme if: Varð� y ndsy Þ < Varð� y sy Þ: ð29Þ Using (19) and (26) in (29) and on simplification the condition reduces to: The proposed sampling scheme is more efficient than Subramani's [8] modified linear systematic sampling scheme if: Varð� y ndsy Þ < Varð� y msy Þ: ð31Þ Using (21) and (26) in (31) and simplification yields: The proposed sampling design is more precise than diagonal systematic sampling design if: Varð� y ndsy Þ < Varð� y dsy Þ: ð33Þ Using (20) and (26) in (33) and after simplification, condition (33) reduces to: Eq (34) usually holds if k is more than twice the value of n. Table 3 shows the improvement in efficiency for different choices of k and n.

A numerical illustration using milk yield data
The milk yield data of S-19 brand of Sahiwal cows for 252 days from the date of calving was obtained from Pandey and Kumar [28]. From the daily observed milk yield of cows (in liters) as given in Pandey and Kumar [28], one can observe that with the passage of time, the milk yield decreases, leading to a linear trend in the data set.
The variances of different sampling schemes on the basis of milk yield data are given in Table 2. It clear that the proposed modified diagonal sampling scheme is more efficient than some of the available sampling designs including both the diagonal systematic sampling and Subramani's [8] sampling scheme. It is worth noting that since the population size is N = 252 units and systematic sampling requires that N = nk, therefore, in order to make efficiency comparison possible, a few units were deleted from the population for some choices of n and k so that the value of N reconciles with n and k. For example, for n = 10 and k = 25, the last two units were deleted thus reducing the population size N = 250. In this case, N = 250 was used for the calculation of variance for each sampling scheme in order to make the comparison under identical conditions.

Conclusion
The variances of the mean based on simple random sampling, linear systematic sampling, diagonal systematic sampling, Subramani's [8] modified systematic sampling, Subramani's [13] optimal circular systematic sampling, and the proposed modified diagonal systematic sampling method for various choices of n and k have been presented in Table 3. The values of n and k have been chosen in such a way that N = nk and k � n − 1. Since the constant b 2 is a multiple in the variance of every sampling scheme, so b = 1 has been used to make comparison simpler. Efficiency comparison has been made for small sample sizes as well as for large sample sizes. It is worth mentioning that since Subramani's [13] optimal circular systematic sampling requires that N = nk ± 1, so in order to make efficiency comparison possible with Subramani's [13] procedure, the value of N = nk + 1 is used in the calculation of the variance in Table 3. For N = nk − 1, the computations of variance of Subramani's [13] sampling scheme will be almost the same as those for N = nk + 1, so the values of the variance of Subramani's [13] sampling scheme are presented only for N = nk + 1. It is clear that the suggested modified diagonal systematic sampling scheme is better than the other existing sampling schemes in terms of efficiency. This high gain in efficiency makes the proposed modified diagonal systematic sampling schemes more preferable than the existing sampling schemes in situations where the population units follow a linear trend.